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Abstract 



Many real world domains require the representation of a measure of uncertainty. The most 
common such representation is probability, and the combination of probability with logic 
t"~~- ' programs has given rise to the field of Probabihstic Logic Programming (PLP), leading to 

(«_^ \ languages such as the Independent Choice Logic, Logic Programs with Annotated Disjunc- 

tions (LPADs), Problog, PRISM and others. These languages share a similar distribution 
semantics, and methods have been devised to translate programs between these languages. 
The complexity of computing the probabihty of queries to these general PLP programs is 
. , ■ very high due to the need to combine the probabilities of explanations that may not be ex- 

I^nJ ' elusive. As one alternative, the PRISM system reduces the complexity of query answering 

by restricting the form of programs it can evaluate. As an entirely different alternative, 
Possibilistic Logic Programs adopt a simpler metric of uncertainty than probability. 

Each of these approaches - general PLP, restricted PLP, and Possibilistic Logic Pro- 
gramming - can be useful in different domains depending on the form of uncertainty to 
be represented, on the form of programs needed to model problems, and on the scale of 
the problems to be solved. In this paper, we show how the PITA system, which originally 
supported the general PLP language of LPADs, can also efficiently support restricted PLP 
and Possibilistic Logic Programs. PITA relies on tabling with answer subsumption and 
consists of a transformation along with an API for library functions that interface with 
answer subsumption. We show that, by adapting its transformation and library functions, 
PITA can be parameterized to PITA(IND,EXC) which supports the restricted PLP of 
PRISM, including optimizations that reduce non-discriminating arguments and the com- 
putation of Viterbi paths. Furthermore, we show PITA to be competitive with PRISM 
for complex queries to Hidden Markov Model examples, and sometimes much faster. We 
further show how PITA can be parameterized to PITA(COUNT) which computes the 
number of different explanations for a subgoal, and to PITA(POSS) which scalably im- 
plements Possibilistic Logic Programming. PITA is a supported package in version 3.3 of 
XSB. 
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1 Introduction 

Uncertainty, imprecision and vagueness are very important for modeling real world 
domains where facts can often not be ascertained with complete confidence. In the 
field of Logic Programming, there have recently been many efforts to include these 
characteristics, originating whole research fields such as Probabilistic Logic Pro- 
gramming (PLP), Possibilistic Logic Programming and Fuzzy Logic Programming. 
In all three fields many approaches have been proposed for modeling uncertainty, 
imprecision and vagueness, obtaining new languages that are often equipped with 
efficient inference algorithms. 

In Probabilistic Logic Programming, a large number of languages have been inde- 
pendently proposed. Many of these however follow a common approach, the distri- 
bution semantics (jSato 1995^ , and in fact there are transformations for converting a 
program in one PLP language into another PLP language (jVennekens and Verbaeten 2003| 
IDe Raedt et al. |l . Examples of such PLP languages are Probabilistic Logic Pro- 
grams (jDantsin 199ip . Probabilistic Horn Abduction (PHA) (jPoole 1993^ . Indepen- 
dent Choice Logic (ICL) (|Poole 1997p . PRISM (|Sato 1995p . Logic Programs with 
Annotated Disjunctions (LPADs), (jVennekens et al. 2004|) and ProbLog (|De Raedt et al. 2007| . 
Most of these languages impose few restrictions on the type of programs they 
can evaluate - ICL, LPADs and others for instance, have been defined on nor- 
mal programs with function symbols. Accordingly, we term systems that evaluate 
large classes of PLP programs general PLP systems. However a great deal of ef- 
ficiency and scalability can be obtained by restricting how different explanations 
are constructed and combined. Such an approach is adopted by the PRISM system 
(jSato et al. 2010[) which we refer to as a restricted PLP system. Both general and 
restricted PLP systems have advantages in different domains depending on the form 
of uncertainty to be represented, the form of programs needed to model problems, 
and on the scale of the problems to be solved. 

Possibilistic Logic Programming models uncertainty by means of possibility the- 
ory rather than probability theory. Possibilistic Logic Programming aims at com- 
puting the degree of uncertainty of a query in the form of a necessity measure. Given 
a possibilistic knowledge base, inference rules have been developed for answering 
queries ([Dubois and Prade 2004^ . 

In this paper we show that an inference technique and system developed for 
general PLP called Probabilistic Inference with Tabling and Answer subsumption 
(PITA), can be parameterized to efhciently reason with different measures of un- 
certainty. PITA translates a general PLP program into a normal program that is 
evaluated by a Prolog engine with tabling. The transformation adds an extra ar- 
gument to each subgoal to provide access to an auxiliary data structure used in 
computing the uncertainty of the subgoal. The transformed program is evaluated 
using tabling to memo intermediate results and to support well-founded negation. 
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along with a tabling feature named answer subsumption to combine explanations 
from different clauses, and a set of library predicates to interface with the auxiliary 
data structure. 



PITA was first presented in (Riguzzi and Swift 2010b I and addressed general 



PLP using Binary Decision Diagrams (BDDs) as auxiliary data structures. That 
version of PITA, termed here PITA(PROB), was compared with ProbLog, cplint 
dRiguzzi 2007| ) and CVE ()Meert et al. 2009[l and found to be fast and scalable. In 
this paper we first consider a parameterization called PITA(IND,EXC) and compare 
to the restricted PLP system PRISM, one of the first and most widely used systems 
for PLP. Preliminary results show that PITA(IND,EXC) turns out to be faster than 
PRISM on complex queries to a naive encoding of a Hidden Markov Model (HMM). 
When the optimized encoding proposed by ( [Christiansen and Gallagher 2009D is 
used, the timing result depend on the input data, with PRISM faster on random 
sequences and PITA(IND,EXC) faster on repeated sequences. When adapting PITA 
to compute the most probable explanation of the query (or Viterbi's path), we 
obtain similar performances in relation to PRISM. 

Moreover, we show that PITA can be also be parameterized to PITA(POSS) 
to compute the necessity of formulas from Possibilistic Logic Programs, and show 
the resulting implementation to be highly scalable. Together, these results show 
the versatility of the PITA algorithm, and how the implementation can be easily 
adapted to support different types of uncertain reasoning. 

The paper is organized as follows. Section[5]presents Probabilistic Logic Program- 
ming while Section |3] discusses Possibilistic Logic Programming. Section 2] reviews 
tabling and answer subsumption; while Section [S] presents the PITA program trans- 
formation and PITA(PROB). In Section [6] we describe PITA(IND,EXC) together 
with experimental results on an HMM dataset. Section [7] presents PITA(POSS) for 
computing necessity levels from possibilistic programs. 



2 Probabilistic Logic Programming 

Various languages have been proposed in the field of Probabilistic Logic Program- 



ming, such as for example Bayesian Logic Programs ( Kersting and De Raedt 2000 1 



CLP(BN) (jSantos Costa et al. 2003]) or P-log (jBaral et al. 2009|) . A large group of 
languages follows the distribution semantics (jSato 1995P or a variant thereof. In 
the distribution semantics a probabilistic logic program defines a probability dis- 
tribution over a set of normal logic programs (called worlds). The distribution is 
extended to a joint distribution over worlds and queries and the probability of a 
query is obtained from this distribution by marginalization. 

The languages differ in the way they define the distribution over logic programs. 
Each language allows probabilistic choices among atoms in clauses: Probabilistic 
Logic Programs, PHA, ICL, PRISM, and ProbLog allow probability distributions 
over facts, while LPADs allow probability distribution over the heads of clauses. All 
these languages have the same expressive power: there are transformations with lin- 
ear complexity that can convert each one into the others (jVennekens and Verbaeten 20031 
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IDe Raedt et al. |) . In this paper we will use LPADs because their syntax is the most 
general. 

Example 1 

The following LPAD Ti captures a Markov model of length two with three states 

of which state 3 is an end state 



Ci= s(0,l) 

C2= S(l,l) 

C3= s(l,l) 



l/3Vs(0,2) : l/3Vs(0,3) : 1/3. 
l/3Vs(l,2) : l/3Vs(l,3) : 1/3 ^ s(0,l). 
0.2Vs(l,2) :0.2Vs(l,3) :0.6 ^ s(0,2). 



The predicate s(T, S) models the fact that the system is in state S at time T. 
Clause Ci selects the first state, while clauses C2 and C3 select the second state 
depending on the value of the first. As state 3 is the end state, if s(0, 3) is selected 
at time 0, no state follows. 

LPADs are sets of disjunctive clauses in which each atom in the head is annotated 
with a probability. If the probabilities in the head do not sum up to 1, an extra 
dummy atom null is implicitly assumed to represent the remaining probability 
mass and is such that it does not appear in the body of any clause. A ground 
LPAD clause represents a probabilistic choice among the normal program clauses 
obtained by selecting one of the heads. 

We now define the distribution semantics for the case in which a program does not 
contain function symbols so that its Herbrand base is finite Q|. Let us first introduce 
some terminology. An atomic choice is a selection of the i-th atom for a grounding 
C9 of a probabilistic clause C and is represented by the triple {C,6,i). 

For example, (C2,{},1) is an atomic choice selecting atom s(l, 1) from C2 ob- 
taining the clause 

s(l,l)^s(0,l). 

A set of atomic choices n is consistent if (C, 0, i) £ k, (C, 9,j) G k => i = j, i.e., only 
one head is selected for a ground clause. For example k — {(C2, {}, 1), (C2, {}, 2)} 
is not consistent. 

A composite choice k is a consistent set of atomic choices. The probability of 
composite choice n is 

p{^)= n po{c,t) 

where Po{C,i) is the probability annotation of head i of clause C. A selection a 
is a total composite choice (one atomic choice for every grounding of each proba- 
bilistic statement/clause). For example, a — {(Ci, {}, 1), (C2, {}, 1), (C3, {}, 2)} is 
a selection for T\. A selection a identifies a logic program Wf, called a world. The 
probability of Wcr is P{wcr) = Pier) = Hrce i)i£a Po{C,i). Since the program does 
not have function symbols the set of worlds is finite: Wt = {wi, . . . , Wm} and P(zi;) 
is a distribution over worlds: J2wew Piw) = 1 



^ However, the distribution semantics for programs with function symbols has been defined as 
well USato 1995,, Poole 2000. iRiguSzTand Swift 2010a[ l. 
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We can define the conditional probability of a query Q given a world: P{Q\w) = 1 
if Q is true in w and otherwise. The probability of the query can then be obtained 
by marginalizing over the query 

p{Q) = Y. ^(Q' ^) = E P{Q\w)P{y^) = E ^H 

WW w \=Q 

Inference in probabilistic logic programming is performed by finding explanations 
for queries. An explanation is a composite choice such that the query is true in 
all the worlds that are compatible with the composite choice. The query is true 
if one of the explanations happens, so the query is true if the disjunction of the 
explanations is true, where each explanation is interpreted as the conjunction of 
all its atomic choices. Each of these choices is associated to a probability so the 
problem of computing the probability of the query is reduced to the problem 
of computing the probability of a DNF formula, which is an NP-hard problem 
( [Kimmig et al. 2008[ ) . The most efficient way to date of solving the problem makes 
use of Binary Decision Diagram (BDDs) that are used to represent the DNF formula 
in a way that allows to compute the probability with a simple dynamic programming 
algorithm (|De Raedt et al. 2007| [Riguzzi 2007| [Kimmig et al. 200 8; 'Riguz zi 2008| 



[Riguzzi 2009{ [Riguzzi and Swift 2010a, ,Rigu"ii"i and Swift 2010b[ [Riguzzi 2010 



3 Possibilistic Logic Programming 

Possibilistic Logic ([Dubois et al. 1994|) is a logic of uncertainty that allows reason- 
ing under incomplete evidence. In this logic, the degree of necessity of a formula 
expresses to what extent the available evidence entails the truth of the formula and 
the degree of possibility expresses to what extent the truth of the formula is not 
incompatible with the available evidence. 

Given a formula </>, we indicate with 11(0) its degree of possibility and with N{(p) 
its degree of necessity. Their relation is established by iV((/)) = 1 — n(-i0). 

A possibilistic clause is a first order logic clause C to which a number is attached 
taken as a lower bound of its necessity or possibility degree. We consider here 
the possibilistic logic CPLl ([Dubois et al. 1991]) in which only lower bounds on 
necessity are considered. Thus {C,a) means that N{C) > a. A possibilistic theory 
is a set of possibilistic clauses. 

A possibility measure satisfies a possibilistic clause (C, a) if N{C) > a or equiv- 
alently if n(-iC) < 1 — a. A possibility measure satisfies a possibilistic theory if 
it satisfies every clause in it. A possibilistic clause (C, a) is a consequence of a 
possibilistic theory F if every possibility measure satisfying F also satisfies {C,a). 

Inference rules of classical logic have been extended to rules in possibilistic logic. 
Here we report two sound inference rules ([Dubois and Prade 2004P : 

• ((/), a), {i/j, (3) \- {R{(t>, ip), min(a, /3)) where R{4', ip) is the resolvent of and tp 
(extension of resolution) 

• {(I), a), {4>,I3) h (0, max(Q;, /3)) (weight fusion) 

A Possibilistic Logic Programming language has been proposed in ([Dubois et al. 199T|) . 
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A Possibilistic Logic Program is a set of formulas of the form (C, a) where C is a 
definite program clause 

H ^ Bi,. . . ,Bn. 

and a is a possibility or necessity degree. We consider the subset of this language 
that is included in CPLl, i.e., a is a real number in (0,1] that is a lower bound 
on the necessity degree of C. The problem of inference in this language consists in 
computing the maximum value of a such that N{Q) > a holds for a query Q. The 
above inference rules are complete for this language. 

Example 2 

The following possibilistic program computes the least unsure path in a graph, i.e., 
the path with maximal weight, the weight of a path being the weight of its weakest 
edge (|Dubois et al. 199ip . 

{path{X,X), 1) 

{path{X,Y) ^ path{X,Z),edge{Z,Y), 1) 

(edge{a,b), 0.3) 

We restrict our discussion here to positive programs. However we note that ap- 
proaches for normal Possibilistic Logic programs have been proposed in (jNieves et al. 200"7j 
INicolas et al. 2006| lOsorio and Nieves 2009^ and (|Bauters et al. 2010p . 

4 Tabling and Answer Subsumption 

The idea behind tabling is to maintain in a table both subgoals encountered in 
a query evaluation and answers to these subgoals. If a subgoal is encountered 
more than once, the evaluation reuses information from the table rather than re- 
performing resolution against program clauses. Although the idea is simple, it has 
important consequences. First, tabling ensures termination for a wide class of pro- 
grams, and it is often easier to reason about termination with programs using 
tabling than with basic Prolog. Second, tabling can be used to evaluate programs 
with negation according to the WFS. Third, for queries to wide classes of programs, 
such as datalog programs with negation, tabling can achieve the optimal complex- 
ity for query evaluation. And finally, tabling integrates closely with Prolog, so that 
Prolog's familiar programming environment can be used, and no other language is 
required to build complete systems. As a result, a number of Prologs now support 
tabling including XSB, YAP, B-Prolog, ALS, and Ciao. In these systems, a predi- 
cate p/n is evaluated using SLDNF by default: the predicate is made to use tabling 
by a declaration such as table p/n that is added by the user or compiler. 

This paper makes use of a tabling feature called answer subsumption. Most for- 
mulations of tabling add an answer A to a table for a subgoal S only if A is a not 
a variant (as a term) of any other answer for S. However, in many applications it 
may be useful to order answers according to a partial order or (upper semi-)lattice. 
As an example, consider the case of a lattice on the second argument of a binary 
predicate p. Answer subsumption may be specified by means of a declaration such 
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as table p(-, join/3 - bottom/ 1) where bottom /I returns the bottom element of the 
lattice and join/i is the join operation of the lattice. Thus if a table had an an- 
swer p{a, di) and a new answer p(a, ^2) were derived, the answer p{a, di) would be 
replaced by ^(0,^3), where d^ is obtained by calling jom((ii, 1^2, ds)- In the PITA 
algorithm for LPADs presented in Section [SI the last argument of atoms is used 
to store explanations for the atom in the form of BDDs and the join/3 operation 
is the logical disjunction of two explanationso; under the simplifying assumptions 
of PITA(IND,EXC) or/3 is simple addition; while for possibilistic logic or/3 takes 
the maximum of its input arguments. Answer subsumption over arbitrary upper 
semi-lattices is implemented in XSB for stratified programs (jSwift 1999^ : in addi- 
tion, the mode-directed tabling of B-Prolog (cf. (jZhou 201ip ) can also be seen as a 
form of answer subsumption. 

For function-free programs, the tabling used by the PITA system terminates cor- 
rectly for left-to-right dynamically stratified LPADs. However, we note that the ter- 
mination results of ( [Riguzzi and Swift 2010a I and PITA itself both apply to a much 



larger class of well-defined LPADs with function symbols. As noted in Section[2l the 
major probabilistic logic languages defined under the distribution semantics can be 
finitely translated into one another, so that the termination and correctness results 
for LPADs extend to other languages: in particular to the restricted PLP language 



of Section [6] In addition the results of ([Riguzzi and Swift 2010a[) , which capture 



termination of general probabilistic programs that give rise to multiple worlds, di- 
rectly apply to the simpler case of Possibilistic Logic Programs, which do not give 
rise to multiple worlds. 



5 PITA for General Probabilistic Logic Programming 

The PITA Trans form,ation. PITA computes the probability of a query from a prob- 
abilistic program in the form of an LPAD by first transforming the LPAD into a 
normal program containing calls to manipulate uncertainty information. The idea 
is to add an extra argument to each literal to access a data structure containing 
the information that is necessary for computing the probability of the subgoal. 
The extra arguments of these literals are combined using a set of general library 
functions: 

• init, end: initialize and terminate the extra data structures necessary for 
manipulating uncertainty information 

• zero(-D), one(-D), and(+Dl,+D2,-D0), or(+Dl,+D2, -DO), not(+Dl,-DO): 
Boolean operations between uncertainty information data structures; 

• add-var(+N_Val,+Probs,-Var): addition of a new multi-valued random vari- 
able with N_Val values and list of probabilities Probs; 

• equality (+Var, + Value,- D): Z) is a data structure representing Var— Value, i.e. 
that the random variable Var is assigned Value in D; 



2 



The logical disjunction d^ can be seen as subsuming d\ and ^2 over the partial order af impli- 
cation defined on propositional formulas that represent explanations. 
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• ret_prob(+D,-P): returns the probability of the data structure D. 

The auxihary predicate get_var_n(+R,+S,+Probs,-Var) is used to wrap add_var/3 
to avoid adding a new random variable when one already exists for a given clause 
instantiation. As shown below, a new fact var(R,S, Var) is asserted each time a new 
random variable is created: Var is an integer that identifies the random variable 
associated with clause R under the grounding represented by S. get-var_n/4 has the 
following definition 

get_var_n{R, S, Probs, Var) ^— 
(var{R, S, Var) -^ true] 
length{Probs, L), addjvar{L, Probs, Var), assert{var{R, S, Var))). 

The PITA transformation applies to clauses, literals and atoms. The transforma- 
tion for a head atom H, PITAh{H), is H with the variable D added as the last 
argument. Similarly, the transformation for a body atom Aj, PITAb{Aj), is Aj 
with the variable Dj added as the last argument. The transformation for a negative 
body literal Lj ~ -^Aj, PITAb{Lj), is the Prolog conditional 

{PITA'g{Aj) -^ not{DNj, Dj);one{Dj)), 

where PITA'g{Aj) is Aj with the variable DNj added as the last argument. In 
other words, the input data structure, DNj, is negated if it exists; otherwise the 
data structure for the constant function 1 is returned. 
The disjunctive clause 

Cr = Hi : ai W . . . V H„ : an <~ Li, . . . , Lm- 

where the parameters sum to 1, is transformed into the set of clauses PITA{Cr) 

PITA{Cr, 1) = PITAh{Hi) ^ one{DDo), 

PITAB{Li),and{DDa, Di,DDi), ..., 
PITAB{L„,),and{DD^^i,D^, DD^), 
getjuarjn{r, VC, [ai,. . . , a„] , Var) , 
equality {Var, 1, DD), and{DD„i, DD, D). 

PITA{Cr,n) = PITAniHn) ^ one{DDo), 

PITAb{Li), and{DDo, Di,DDi), ..., 
PITABiLrn), and{DDm-i,D^,DD^), 
get_varjn{r, VC, [ai, . . . ,a„], Var), 
equality [Var, n, DD), and{DDm, DD, D). 

where VC is a list containing each variable appearing in Cr- 
Example 3 
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Clause Ci from the LPAD of Example [T] is translated into 

5(0,1,1?) ^ one{DDo),get_var.n{l,[], [1/3, 1/3, 1/3], Var), 
equality{Var, 1, DD),and{DDa, DD, D). 

s{0,2,D) ^ one{DDo),get_var.n{2,[], [1/3, 1/3, 1/3], Var), 
equality{Var, 1, DD),and{DDo, DD, D). 

s{0,3,D) f- one{DDQ),get_var_n{3,[], [1/3, 1/3, 1/3], Var), 
equality{Var, 1, DD),and{DDo, DD, D). 

In order to answer queries, the goal genLproh(Goal,P) is used, which is defined by 

genl _prob{Goal , P) <— init, retractall{var{_, _, _)), 
add ji^arg (Goal, D, GoalD), 
(calliGoalD) -^ ret.prob{D,P);P = 0.0), 
end. 

where addjl_arg{Goal,D, GoalD) implements PITAniGoal). 

Evaluating the Transformed Program. Various predicates of the transformed pro- 
gram should be declared as tabled. For a predicate p/n, the declaration is table 
p(-l,...,-n,or/3-zero/l), which indicates that answer subsumption is used to form 
the disjunct of multiple explanations. At a minimum, the predicate of the goal and 
all the predicates appearing in negative literals should be tabled with answer sub- 
sumption. However, it is usually better to table every predicate whose answers have 
multiple explanations and are going to be reused often. 

5.1 PITA Library Functions for the General Probabilistic Case 

In the case of general probabilistic programs, the data structure for representing 
probabilistic information is a Binary Decision Diagram. With such a data structure, 
we can represent the explanations for the queries in a form in which they are 
mutually exclusive and so the computation of the probability can be performed by 
an effective dynamic programming algorithm. 

The predicates that manipulate the data structure in this case manipulate BDDs. 
In our implementation, these calls provide a Prolog interface to the functions in the 
CUDD C library ( ,http : //vlsi . Colorado . edu/-f abio/CUDD^ . The predicates for 
interfacing with CUDD are 

• init, end: for allocation and deallocation of a BDD manager, a data structure 
used to keep track of the memory for storing BDD nodes; 

• zero(-B), one(-B), and(+Bl, +B2, -B), or(+Bl, +B2, -B), not(+Bl, -B): 
Boolean operations between BDDs; 



6 PITA(IND,EXC) 

As discussed in Section [2l general Probabilistic Logic Programming requires the 
computation of the probability of DNF formulas - a difficult problem. The PRISM 
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system avoids this complexity by imposing special requirements on the form of a 
program it can correctly evaluate. These requirements are (jSato et al. 2010p 

• the probability of a conjunction (A, B) is computed as the product of the 
probabilities of A and B {independence assumption) 

• the probability of a disjunction [A] B) is computed as the sum of the proba- 
bilities of A and B (exclusiveness assumption). 

It is possible to write programs so that these requirements are not met. For example, 
consider the program 

p^a,b. a:0.3V6:0.4. 

This program does not satisfy the independence assumption because the con- 
junction a, b has probability 0, since a and b are never true in the same world. 
PITA(PROB) correctly gives probability for p while PRISM returns probability 
0.12. In this case the conjunction (a, 5) is inconsistent and, while PITA(PROB) 
automatically recognizes it, the inconsistency must be detected and the clause re- 
moved for PRISM to return the correct probability. The following example also 
does not satisfy the independence assumption because a and b both depend on c. 
PITA(PROB) returns 0.2 for the probabihty of q while PRISM returns 0.04. 

q -^r— a,b. a ■'r— c. b -^ c. c : 0.2. 

As a final example, the following program violates the exclusiveness assumption 
as the two clauses for the ground atom q have non-exclusive bodies 

q ^ a. q^b. a: 0.2. b : 0.4. 

These restrictions required by PRISM simplify considerably the computation since 
we can now ignore the dependencies between the explanations of different subgoals. 
PITA can be optimized for PRISM-style programs by simplifying the program 
transformation it uses, and by implementing simpler library functions. The clause 
Cr = Hi : ai V . . . V i7„ : a„ <— Li, . . . , L,n is transformed into the set of clauses 

PITAP{Cr) 

PITA^iCr, 1) = PITAh{Hi) ^ oneiDDo), 

PITAB{Li),and{DDo, Di,DDi), ..., 
PITAB{L,n),and{DDm-i,D^, DD^), 
equality{[ai, . . . , a„], 1, DD), 
and{DDm,DD,D). 

PITAP{Cr,n) = PITAniBn) ^ one{DDo), 

PITAB{Li),and{DDo, Di,DDi), ..., 
PITAB{Lm),and{DDm-i,Dm, DD^), 
equality{[ai, . . . , a„], n, DD), 
and{DD^,DD,D). 

The auxiliary data structure stored in the extra subgoal argument is no longer a 
BDD, but simply a real number that represents the probability of a ground instan- 
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tiation of that subgoal. The hbrary functions are now simple Prolog predicates. 

equality {Probs, N, P) <- nth{N, Probs, P). 

or{A, B,C) ^C is A + B. and{A, B,C) ^C is A*B. 

not{P, PI) ^ Plisl-P. 

^ero(O.O). one(l.O). 

ret-prob{P,P). 

We call the resulting algorithm PITA(IND,EXC). 

An example of a program satisfying the PRISM requirements encodes a Hidden 
Markov Model (HMM), a graphical model with a sequence of unobserved state 
variables, a sequence of observed output variables, and where each state variable 
depends only on its preceding state. HMMs have a wide range of applications, 
including the modeling of DNA sequences. The following program, taken from 
( [Christiansen and Gallagher 2009[ ) models DNA sequences using three states: 

hmm(O) <— hmm,l(_,0). 

hmml(S,0)<r- hmm(ql,[],S,0). 

hmmfendjSjSyf]). 

hmm(Q,S0,S,[L\O])^ Q\^ end, succ(Q,Ql,SO), out(Q,L,SO), 

hmm(Ql,[Q\S0],S,O). 
succ(ql,ql,.S):l/3y succ(ql,q2,.S):l/3 V succ(ql,end,_S):l/3. 
succ(q2,ql,_S):l/3y succ(q2,q2,.S):l/3 V succ(q2,end,_S):l/3. 
out(ql,a,.S):l/4 V out(ql,c,_S):l/4 V out(ql,g,_S):l/4 V out(ql,t,_S):l/4. 
out(q2,a,_S):l/4 V out(q2,c,_S):l/4 V out(q2,g,_S):l/4 V out(q2,t,_S):l/4. 

In order to investigate the relative performances of PITA(IND,EXC) and PRISM, 
we computed the execution time of queries to hmm/l for increasing lengths of the 



output sequence. Sequences used in Figure 1(a) are randomly generated, while those 



in Figure 1(b) are repetitions of the sequence a,c,g,t. (Version 2.0 of Prism was 
used in all the experiments.) In both cases, the costs for both algorithms grow 
exponentially. Times for both systems are close for N up to 11; however beyond 
A^ — 12, PITA(IND,EXC) begins to scale somewhat better than Prism, answering 
queries through A = 18 while Prism can answer queries only through A^ = 14. 
Beyond those numbers, both systems throw memory errors. 

( [Christiansen and Gallagher 2009D proposed a technique for speeding up query 
answering by removing non-discriminating arguments. These are arguments that 
play no role in determining the control flow of a logic program with respect to 
goals satisfying given mode and sharing restrictions. The computation trees of the 
resulting program are isomorphic to those of the original program and the results 
of the original program can be reconstructed from a trace of the transformed pro- 
gram. The authors show that the removal of non-discriminating arguments is very 
useful with tabling because the calls to a tabled predicate differing only in the 
non-discriminating arguments will merge into a single table that is much smaller 
and has a higher chance of reuse. After removing non-discriminating arguments, 
the HMM program above becomes 

hmm(O)^ hmm(ql,0). 
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-PITA(IND,EXC) : 



(a) PITA(IND,EXC) and PRISM on ran- (b) PITA(IND,EXC) and PRISM on re- 
dom sequences. peated sequences. 

Fig. 1. Times for computing P{hmm{< seq >)) as a function of sequence length. 
Missing points at the beginning of the X-axis correspond to a time smaUer than 
10~^ seconds, missing points at the end of the X-axis correspond to a memory error. 
The experiments were performed on a Core 2 Duo E6550 (2333 MHz) processor. 




(a) PITA(IND,EXC) and PRISM on ran- 
dom sequences. 



(b) PITA(IND,EXC) and PRISM on re- 
peated sequences. 



Fig. 2. Times for computing P{hmm{< seq >) as a function of sequence length 
(reduced program with non-discriminating arguments removed). The experiments 
were performed on a Core 2 Duo E6550 (2333 MHz) processor. 



hmm(end,[]). 
hmm(Q,[L\0])i 



\— end, succ(Q,Ql,S0),out(Q,L,S0),hmm(Ql,O). 



plus the clauses defining succ/2 and out/2. 



Figures 2(a) and 2(b) show the computation time for PITA(IND,EXC) and 



PRISM on the reduced HMM program as a function of the sequence length for ran- 
domly generated and repeating sequences. For random sequences, PITA(IND,EXC) 
and Prism are competitive, with Prism slightly faster; however for the repeating 
sequences PITA(IND,EXC) is much faster, and in fact scales well up to input se- 
quences of length e>(10^). The reason for the scalability of PITA(IND,EXC) on 
repeated sequences is apparently due to XSB's use of trie-based tables, which al- 
lows good indexing and space sharing for repeating subsequences. The tabling of 
Prism, which is based on hash tables, loses discrimination in this case. 
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Computing the Viterbi Path. In HMMs, it is common to look for the sequence of 
state values that most likely gave the output sequence, also called the Viterbi path, 
while the probability of this sequence of states is called the Viterbi probability. This 
is equivalent to finding the most probable explanation for the goal. 

The Viterbi path and probability are computed by PRISM with the viterbif /3 
predicate but can be computed also by PITA(IND,EXC) by modifying it so that the 
probability data structure includes not only the highest probability of the subgoal 
but also the most probable explanation for the subgoal. In this case the support 
predicates are modified as follows; 

equality(R,S,Probs,N,e([(R,S,N)],P))*^ nth(N,Probs,P). 
or(e(El,Pl),e(_E2,P2),e(El,Pl))^ PI >= P2,l 
or(e(.El,_Pl),e(E2,P2),e(E2,P2)). 

and(e(El,Pl),e(E2,P2),e(E3,P3))^ P3 is Pl*P2,append(El,E2,E3). 
zero(e(null,0)). one(e([],l)). ret_prob(B,B). 

In this way we obtain PITAVIT(IND), which is also sound if the exclusiveness 
assumption does not hold. 

Figures |3(a)| and |3(b)| show times for PITAVIT(IND) and PRISM to compute 
Viterbi paths and probabilities on the reduced HMM program. PITAVIT(IND) is 
slower than PRISM for short random sequences and roughly the same on long 
sequences. On repeated sequences it is much more scalable. 





(a) PITAVIT(IND) and PRISM on random (b) PITAVIT(IND) and PRISM on re- 
sequences, peated sequences. 

Fig. 3. Times for computing the Viterbi path and probability of hmm{< seq >) as 
a function of sequence length (reduced program with non-discriminating arguments 
removed). The timings were taken on an Intel Core 15 (2.53 GHz) processor. 

Counting Explanations PITA(IND,EXC) can be used to count explanations for 
goals with a slight modification when explanations for different goals are not in- 
compatible. To obtain PITA(COUNT), the only auxiliary predicate to be modified 
is equality /3: equality{_Probs, _N,1). 



7 Application to Possibilistic Logic Programming 

PITA also can be used to perform inference in Possibilistic Logic Programming 
where a program is composed only of clauses of the form H : a ^ Bi, . . . ,Bn which 
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we interpret as possibilistic clauses of the forra {H <— Bi, . . . ,Bn,a). For space 
reasons we do not discuss negation here, however the pubhcly available version of 
PITA computes possibilistic programs that are left-to-right dynamically stratified 
(Section |4]) according to the semantics of (jBauters et al. 2010| . 

The transformation PITA^ used for the PRISM optimization can be used un- 
changed provided the support predicates are defined as 

equality ([P,_PO],_N,P). 

or(A,B,C)^ C is max(A,B). and(A,B,C)^ C is min(A,B). 

zero (0.0). one (1.0). ret_prob(P,P). 

We obtain in this way PITA(POSS). The input list of the equality /3 predicate 
contains two numbers because we used the same preprocessing code as for LPADs. 
Specializing the transformation for possibilistic logic programs would remove the 
need for the equality /3 predicate. 

To experiment with PITA(POSS), we consider the networks of biological con- 
cepts of (jDe Raedt et al. 2007|) and the definition of path/2 of Example [H In these 
networks the nodes encode biological entities and the edges conceptual relations 
among them. In each program the edges are associated to a real number. The pro- 
grams have been sampled from a very large graph and contain 200, 400, . . ., 10000 
edges. Sampling was repeated ten times, to obtain ten series of programs of increas- 
ing size. In each program we query the possibility that the two genes HGNC_620 
and HGNC_983 are related. 

We use PITA(COUNT) to compute the number of explanations for the query in 
the first series of programs. In this problem, an explanation is a path from source 
to target that does not contain loops. In fact, paths with loops are subsumed by 
paths without loops so they do not contribute to the overall probability. Table [7] 
shows the number of paths for the networks in series 1 for which the computation 
terminated in 24 hours. As you can see, the number of paths grows very fast. 



Table 1. Number of paths. 












1 Edges 1 200 1 


400 1 


600 1 


800 1 


1000 1 


1200 1 


1 Explanations | 10 | 


42 1 


380 1 


1,280 1 


3,480 1 


612,140 1 



Figure 4(a) shows the average over the ten series of the execution time for comput- 
ing the possibility of path('HGNC.620', 'HGNC.983') as a function of the number 
of edges. Figure [4 (b) | shows the number of graphs solved for each graph size. These 
figures also contain data for PITA(PROB), for the equivalent deterministic pro- 
gram (i.e. computing whether there is a path between nodes) and for the system 
posSmodels (jNicolas et al. 2006|i^l . As these figures show, computing the possibil- 

^ For PITA(PROB), we used the definition of path of l |Kimniig et al. 2008[ l because it gave smaller 
timings. PITA(IND,EXC) was not tested because this problem does not satisfy the independence 
and exclusiveness requirements 
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Fig. 4. Results of the experiments on the biological networks. The experiments were 
performed on an Intel Core 2 Duo E6550 (2333 MHz) processor and 4 GB of RAM. 

ity is much easier than computing the general probability, which must solve the 
disjoint sum problem to obtain answers. With respect to the posSmodels system, 
PITA(POSS) is faster for smaller graphs and slower for larger ones, but the aver- 
ages of posSmodels have been computed on less graphs since on some it gave a lack 
of memory error. 

8 Conclusions 

We have shown how the probabilsitic inference system PITA can be easily adapted 
for different settings. In particular, we have considered programs that respect the 
independence and exclusion assumptions that are required by PRISM and show 
how PITA can be modified to exploit these assumption. Preliminary results show 
the algorithm to be faster than PRISM for complex queries to a naive encoding of 
an HMM, while the performance on an optimized encoding depend on the input 
data. Moreover, PITA can be used also for computing the Viterbi path, i.e., the 
most probable explanation for a goal. Finally, we have shown how PITA can be 
modified to perform inference on Possibilistic Logic Programs. 

PITA is a supported package in version 3.3 of XSB, and handles programs that 
include both negation and function symbols. Because PITA consists of a program 
transformation plus library functions that implement an API for answer subsump- 
tion, the approaches of general PLP, restricted PLP and Possibilistic Logic Pro- 
gramming can be combined within a single program. Thus, if it is known that, say, 
predicates in a given module satisfy independence and exclusiveness assumptions, 
the module can use PITA(IND,EXC) and avoid the expense of BDD maintenance. 
Furthermore, simple modifications to PITA would allow the use of general vs. re- 
stricted PLP to be decided on a predicate basis, possibly supported in the future 
by an optimizing compiler that could check exclusiveness of clauses, and indepen- 
dence of literals within the body of a clause. This approach is not only general, 
but portable. For Prologs that implement tabling, the additional effort needed for 
answer subsumption is relatively small so that implementations of PITA need not 
be restricted to XSB. 

Finally, we believe that the techniques presented can be applied also to Soft 
Constraint Logic Programming (SCLP) (jBistarelli and Rossi 200ip . as advocated 
in (jBistarelli et al. 2007| . In this case, PITA's API to answer subsumption would 
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interface with a constraint handling system rather than to BDDs or to simple Prolog 
predicates. In fact, PITA, PITA(IND,EXC) and PITA(POSS) can be as seen as 
implementing SCLP over the semirings {VjV, A, false, true), ([0,1],+, x,0,l) and 
([0, 1], max, min, 0, 1) respectively, where V is the set of propositional formulas built 
over a fixed and finite set of propositions. 
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